Minimal gene set discovery in single-cell mRNA-seq datasets with ActiveSVM
نویسندگان
چکیده
Abstract Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted mRNA-sequencing reduces sequencing by profiling reduced gene sets that capture information with a minimal number genes. Here we introduce an active learning method identifies but highly informative enable identification cell types, physiological states genetic perturbations in data using small Our feature selection procedure generates from employing support vector machine (ActiveSVM) classifier. We demonstrate ActiveSVM ~90% cell-type classification accuracy across, for example, atlas disease-characterization datasets. The discovery should reductions measurements necessary tests, therapeutic screens.
منابع مشابه
Discovery and quantification of transcript variants with SQUARETM mRNA-Seq
The vast majority of genes are alternatively spliced and produce a variety of mature transcripts. These transcript variants often encode proteins with different structures and functions, and changes in the expression of variants from the same gene can lead to profound biological effects (reviewed in 1). Various transcriptional events, including splicing from alternative 5’ or 3’ splice-sites, e...
متن کاملDiscovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets
Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein-ln...
متن کاملGrouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a ...
متن کاملgoseq: Gene Ontology testing for RNA-seq datasets
Bioconductor R packages such as Rsubread allow for the summarization of mapped reads into a table of counts, such as reads per gene. From there, several packages exist for performing differential expression analysis on summarized data (eg. edgeR [Robinson and Smyth, 2007, 2008, Robinson et al., 2010]). goseq will work with any method for determining differential expression and as such different...
متن کاملmRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain.
mRNA-seq is a paradigm-shifting technology because of its superior sensitivity and dynamic range and its potential to capture transcriptomes in an agnostic fashion, i.e., independently of existing genome annotations. Implementation of the agnostic approach, however, has not yet been fully achieved. In particular, agnostic mapping of pre-mRNA splice sites has not been demonstrated. The present s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Computational Science
سال: 2022
ISSN: ['2662-8457']
DOI: https://doi.org/10.1038/s43588-022-00263-8